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(S) Multilevel inclusion in multilevel cache hierarchies. 

® Th* ^^^9 ^ftSevei inclusion in a computer system with first and second level caches. 

2£ ^Z^^^tlPl^J^ **** * meir ««*e controllers communicating 

flfi 5.^!? b>OC S fte****"* replacing and which of their cache ways are bein| 

?^J^J^° ^ «ad misses the first level cache controller provides waS 

i^j^ to the second level cache controller to ailow received data to be placed in the same way. On 
«L£? f^^L^J^^^ second level cache read hits, the second level cache controller 
S^^L^^Sr^2?J^ first level cache controller, which ignores its replacement indication and 
places date tn the indicated way. On processor writes the first level cache controller caches the writes 
^l^^^l^ ^ the second level cache controller which also caches the writes and 
^^^^P^P* to « ^the proper way for data storage. An inclusion bit fs set on data in the 
^S^Jf^Lf 30 ^ a L5 du ^ cated *n the first level cache. Multilevel inclusion allows the second level 
jacnei controller to perform the principal snooping responsibflities for both caches, thereby enabling 
me nrst level cache controller to avoid snooping duties until a first level cache snoop hit occurs. On a 
second level cache snoop hit, the second level cache controller checks the respective inclusion bit to 
detemnine if a copy of this data also resides in the first level cache. The first level cache controller is 
directed to snoop the bus only tf the respective inclusion bit is set 
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The present invention relates to microprocessor 
cache subsystems in computer systems, and more 
specifically to a method for achieving muitflevef inclu- 
sion among first level and second level caches in a 
computer system so that the second level cache con- s 
troller can perform the principal snooping responsibi- 
lities for both caches. 

The personal computer industry Es a vibrant and 
growing field that continues to evolve as new inno- 
vations occur. The driving force behind this innovation 10 
has been the increasing demand for faster and more 
powerful computers. A major bottleneck in personal 
computer speed has historically been the speed with 
which data can be accessed from memory, referred to 
as the memory access time. The microprocessor, with ia 
its relatively fast processor cycle times, has generally 
been delayed by the use of wait states during memory 
accesses to account for the relatively slow memory 
access times. Therefore, improvement in memory 
access times has been one of the major areas of to 
research in enhancing computer performance. 

In order to bridge the gap between fast processor 
cyde times and slow memory access times, cache 
memory was developed. A cache is a smafl amount of 
very fast and expensive, zero wait state memory that 23 
is used to store a copy of frequently accessed code 
and data from main memory. The microprocessor cam 
operate out of this very fast memory and thereby 
reduce the number of wait states that must be inter- 
posed during memory accesses. When the processor 30 
requests data from memory and the data resides in 
the cache, then a cache read hft takes place, and the 
data from the memory access can be returned to the 
processor from the cache without incurring wait 
states. If the data is not in the cache, then a cache 33 
read miss takes place, and the memory request is for- 
warded to the system and the data is retrieved from 
main memory, as would normally be done if the cache 
did not exist On a cache miss, the data that is ret- 
rieved from memory is provided to the processor and 40 
is also written into the cache due to the statistical 
likelihood that this date will be requested again by the 
processor. 

An efficient cache yields a high 'hit rate", which 
is the percentage of cache hits that occur during ail 43 
memory accesses. When a cache has a high hit rate, 
the majority of memory accesses are serviced with 
zero wait states. The net effect of a high cache hit rate 
is that the wait states incurred on a relatively infre- 
quent miss are averaged over a large number of zero so 
wait state cache hit accesses, resulting in an average 
of nearly zero wait states per access. Also, since a 
cache is usually located on the local bus of the micro- 
processor, cache hits are serviced locally without 
requiring use of the system bus. Therefore, a pro- 55 
cessor operating out of its local cache has a much 
lower 'bus utilization/ This reduces system bus 
bandwidth used by the processor, making more 



bandwidth available for other bus masters. 

Another important feature of caches is that the 
processor can operate qut of its local cache when it 
does not have control of the system bus, thereby 
increasing the efficiency of the computer system. In 
• systems without microprocessor caches, the pro- 
cessor generally must remain idle whle It does not 
have control of the system bus. This reduces the over- 
all efficiency of the computer system because the pro- 
cessor cannot do any useful work at this time. 
However, If the processor Includes a cache placed on 
its local bus. It can retrieve the necessary code and 
data from Its cache to perform useful work while other 
devices have control of the system bus, thereby 
increasing system efficiency. 

Cache performance is dependent on many fac- 
tors, induding the hit rate and the cache memory 
access time. The hit rate is a measure of how efficient 
a cache ia in maintaining a copy of the most frequently 
used code and data, and, to a large extent it is a func- 
tion of the size of the cache. A larger cache will gen- 
erally have a higher hit rate than a smaller cache. 
Increasing the size of the cache, however, can poss- 
ibly degrade the cache memory access time. How- 
ever, cache designs for a larger cache can be 
achieved using cache memory with the fastest pos*- 
ible access times such that the limiting factor in the 
design is the minimum CPU access time. In this way, 
a larger cache would not be penalized by a possibly 
slower cache memory access time with respect to the 
memory access time of a smaller cache because the 
limiting factor in the design would be the minimum 
CPU access time. 

Other important considerations in cache perform- 
ance are the organization of the cache and the cache 
management policies that are employed in the cache. 
A cache can generally be organized into either a di- 
rect-mapped or set-associative configuration. In a di- 
rect-mapped organization, the physical address 
space of the computer is conceptually divided up into 
a number of equal pages, with the page size equaling 
the size of the cache. The cache is divided up into a 
number of sets, with each set having a certain number 
of lines. Each of the pages in main memory has a 
number of lines equivalent to the number of lines in 
the cache, and each line from a respective page in 
main memory corresponds to a similarly located line 
in the cache. An important characteristic of a direct- 
mapped cache is that each memory line from a page 
in main memory, referred to as a page offset can only 
reside in the equivalent^ located line or page offset 
in the cache. Due to this restriction, the cache only 
peed refer to a certain number of the upper address 
bits of a memory address, referred to as a tag, to 
determine if a copy of the data from the respective 
memory address resides in the cache because the 
lower order address bits are pre-determined by the 
page offset of the memory address. 
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Wheroas a direct-mapped cache is organized as 
one bank of memory mat is equivalent in size to a con- 
ceptual page in main memory, a set-associative 
cache includes a number of banks, or ways, of mem- 
ory that are each equivalent In stza to a conceptual 5 
page in main memory. Accordingly, a page offset in 
main memory can be mapped to a number of locations 
in the cache equal to the number of ways in the cache. 
For example, in a 4-way set associative cache, a line 
or page offset from main memory can reside in the 10 
equivalent page offset location In any of the four ways 
of the cache. 

A set-associative cache generally includes a rep- 
lacement algorithm that determines which bank, or 
way, with which to fill data when a read miss occurs. 15 
Many set-associative caches use some form of a least 
recently used (LRU) algorithm that places new data in 
the way that was feast recently accessed. This is 
because, statistically, the way most recently used or 
accessed to provide data to the processor is the one 20 
most likely to be needed again in the future. Theref- 
ore, the LRU algorithm ensures that the block which 
is replaced is the least likely to have data requested 
by the cache. 

Cache management is generally performed by a 25 
device referred to as a cache controller. The cache 
controller includes a directory that holds an 
associated entry for each set In the cache. This entry 
generally has three components: a tag, a tag valid bft, 
and a number of line valid bits equaling the number of 30 
lines in each cache set The tag acts as a main mem- 
ory page number, and it holds the upper address bits 
of the particUar page in main memory from which the 
copy of data residing in the respective set of the cache 
originated. The status of the tag valid bit determines 35 
whether the data In the respective set of the cache te 
. considered valid or invalid. If the tag valid bit is clear, 
then the entire set is considered Invalid. If the teg valid 
bit is true, then an individual line within the set is con- 
sidered valid or invalid depending on the status of its 40 
respective line valid bit 

A principal cache management policy is the pre- 
servation of cache coherency. Cache coherency ret- 
ers to the requirement that any copy of data in a cache 
must be identical to (or actually be) the owner of that 49 
location's data. The owner of a location's data is gen- 
erally defined as the respective location having the 
most recent version of the data residing in the respec- 
tive memory location. The owner of data can be efther 
an unmodified location in main memory, or a modified 50 
location in a write-back cache. In computer systems 
where independent bus masters can access memory, 
there is a possibility that a bus master, such as a drect 
memory access controller, network or disk interface 
card, or video graphics card, might alter the contents 55 
of a main memory location that is duplicated in the 
cache. When this occurs, the cache is said to hold 
•stale' or invalid data. In order to maintain cache 



coherency, it is necessary for the cache controller to 
monitor the system bus when the processor does not 
own the system bus to see if another bus master 
accesses main memory. This method of monitoring 
the bus Is referred to as snooping. 

The cache controller must monitor the system 
bus during memory reads by a bus master H a write- 
back cache design because of the possibSfty that a 
previous processor write may have altered a copy of 
data in the cache that has not been updated In main 
memory. This Is referred to as read snooping. On a 
read snoop hit where the cache contains data not yet 
updated in main memory, the cache controller gener- 
ally provides the respective data to main memory, and 
the requesting bus master generally reads this date 
en route from the cache controller to main memory, 
this operation being referred to as anarflng. The cache 
controller must also monitor the system bus during 
memory writes because the bus master may write to 
or after a memory location that resides in the cache. 
This is referred to as write snooping. On a write snoop 
hit, the cache entry is either marked invalid in the 
cache directory by the cache controller, signifying that 
this entry is no longer correct or the cache is updated 
along with main memory. Therefore, whenabus mas- 
ter reads or writes to main memory in a write-back 
cache design, or writes to main memory in a write- 
through cache design, the cache controller must latch 
the system address and perform a cache look-up in 
the tag directory corresponding to the page offset 
location where the memory access occurred to see if 
the main memory location being accessed also 
resides In the cache. If a copy of the data from this 
location does reside in the cache, then the cache con- 
trailer takes the appropriate action depending on 
whether a read or write snoop hit has occurred. This 
prevents incompatible date from being stored in main 
memory and the cache, thereby preserving cache 
coherency. 

Another consideration in the preservation of 
cache coherency is the handling of processor writes 
to memory. When the processor writes to main mem- 
ory, the memory location must be checked to deter- 
mine if a copy of the data from this location also 
resides in the cache. If a processor write hit occurs in 
a write-back cache design, then the cache location is 
updated with the new data and main memory may be 
updated with the new data at a later time or should the 
need arise. In a write-through cache, the main mem- 
ory location is generally updated 61 conjunction with 
the cache location on a processor write hit If a pro- 
cessor write miss occurs, the cache controller may 
ignqre the write miss in a write-through cache design 
because the cache is unaffected In this design. Alter- 
natively, the cache controller may perform a "write- 
allocate* whereby the cache controller allocates a 
new line in the cache in addition to passing. the dab 
the data to the main memory. In a write-back cache ' 
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design, the cache controller generally allocates a new 
line in the cache when a processor write miss occurs. 
This generally Involves reading the remaining entries 
to fill the line from main memory before or jointly with 
providing the write datato the cache. Main memory is 5 
updated, at a later time should the need arise'. 

Caches have generally been designed indepen- 
dently of the microprocessor. The cache Is placed on 
the local bus of the microprocessor and interfaced be- 
tween the processor and the system bus during the to 
design of the computer system. However, with the 
development of higher transistor density computer 
chips, many processors are currently being designed 
with an on-chip cache in order to meet performance 
goals with regard to memory access times. The on- 13 
chip- cache used in these processors is generally 
small, an exemplary size being 8 kbytes in size. The 
smaller, on-chip cache is generally faster then a large 
off-chip cache and reduces the gap between fast pro- 
cessor cycle times and the relatively slow access 20 
times of large caches. 

In computer systems that utilize processors with 
on-chip cachee, an external, second level cache is 
often added to the system to further Improve memory 
access time. The second level cache is generafly 25 
much larger than the on-chip cache, and, when used 
in conjunction wfth the on-chip cache, provides a gre- 
ater overall hit rate than the on-chip cache would pro- 
vide by itself. 

In systems that incorporate multiple levels of 30 
caches, when the processor requests data from mem- 
ory, the on-chip or first level cache is first checked to 
see if a copy of the data resides there. If so, then a first 
level cache hit occurs, and the first levari cache pro- 
vides the appropriate data to the processor. If a first 35 
level cache miss occurs, then the second level cache 
is then checked. If a second level cache hit occurs, 
then the data is provided from the second level cache 
to the processor. If a second level cache miss occurs, 
then the data is retrieved from main memory. Write 40 
operations are simiar. with mix and matching of the 
operations discussed above being .possible. 

In multilevel cache systems, it has generally been 
necessary for each cache to snoop the system bus 
during memory writes by other bus masters in order 45 
to maintain cache coherency. When the microproces- 
sor does not have control of the system bus, the cache 
controllers of both the first level and second level 
caches are required to latch the address of every 
memory write and check this address against the tags so 
in its cache directory. This considerably impairs the 
efficiency of the processor working out of its on-chip 
cache during £his time because it is continually being 
interrupted by the snooping efforts of the cache con- 
troller of the on-chip cache. Therefore, the require- 55 
ment that the cache controller of the on-chip cache 
snoop the system bus for every memory write deg- 
rades system performance. because it prevents the 



processor from efficiently operating out of its on-chip 
cache while it does not have control of the system 
bus. 

In many instances where multilevel cache hierar- 
chies exist with multiple processors, a property refer- 
red to as multilevel Inclusion is desired in the 
hierarchy. Multilevel inclusion provides that the sec- 
ond level cache is guaranteed to have a copy of what 
is inside the first level, or on-chip cache. When this 
occurs, the second level cache is said to hold a 
superset of the first level cache. Multilevel inclusion 
has mostly been used in multi-processor systems to 
prevent cache coherency problems. When multflevel 
inclusion is implemented in multi-processor systems, 
the higher level caches can shield the lower level* 
caches from cache coherency problems and thereby 
prevent unnecessary blind checks and invalidations 
that would otherwise occur in the lower level caches 
if multilevel inclusion were not implemented. 

The present invention includes a method for 
achieving multilevel inclusion among first and second 
level cachee in a computer system. Multilevel inclu- 
sion obviates the necessity of the cache controller of 
the first level cache to snoop the system bus for every 
memory write that occurs while the processor is not in 
control of the system bus because the cache control- 
ler of the second level cache can assume this duty for 
both caches. This frees up the first level cache con- 
troller and thereby allows the microprocessor to oper- 
ate more efficiently put of the first level cache when it 
does not have control of the system bus. 

The second level cache preferably has a number 
of ways equal to or greater than the number of ways 
in the first level cache. The first level and second level 
caches are 4-way set associative caches in the pre- 
ferred embodiment of the present invention. In this 
embodiment there is a one-to-one correspondence 
between the cache ways in the first level cache and 
the cache ways in the second level cache. During a 
first level cache line fll from main memory, the first 
level cache controller communicates to the second 
level cache controller the particular first level cache 
way in which the data is to be placed so that the sec- 
ond level cache controller can place the data in the 
corresponding second level cache way. When the 
second level cache controller is transmitting a copy of 
data to the first level cache controller, the second level 
cache controller informs the frst level cache controller 
which second level cache way the data is coming 
from. The first level cache controller disregards its 
normal replacement algorithm and fills the corre^ 
sponding first level cache way. In this manner, the first 
and second level caches align themselves on a Vay 
basis." This Vay - alignment prevents the second 
level cache controller from placing data in a different 
way than the first level cache and In the process poss- 
ibly discarding data that resides in the first level 
cache. 
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The cache organization of the first level cache 
according to the present invention is a write-through 
architecture. On a processor write, the information is 
preferably written to the first level cache, regardless 
of whether a write hit or write misa occurs, and exter- $ 
nai write bus cycles are Initiated which write the infor- 
mation to the second level cache. The first level cache 
broadcasts the particular ffret level cache way where 
the data was placed to the second level cache control- 
ler so that the second level cache controller can place 10 
the data in the corresponding second level cache 
way, thereby retaining the "way- alignment The sec- 
ond level cache is preferably a writeback cache 
according to the preferred embodiment, but could be 
a write- through cache if desired. 19 

The second level cache controller utilizes an 
inclusion bit with respect to each line of date in the 
second level cache in order to remember whether a 
copy of this data also resides in the first level cache. 
When a location in the first level cache is replaced,' so 
whether concurrently with a second level cache rep. 
lacement from memory or directly from the second 
level cache, the second level cache controller sets an 
inclusion bit for that location in the second level cache 
to signify that a copy of this data is duplicated in the 28 
first level cache. When this occurs, al other locations 
in the second level cache that correspond to the same 
location in the first level cache have their inclusion b*s 
cleared by the second level cache controller to signify 
that the data held in these locations dose not reside 30 
in the first level cache. 

The second level cache controller performs the 
principal snooping duties for both caches when the 
processor does not have control of the system bus. 
When 3 write snoop hft occurs in the second level 35 
cache, -e inclusion bit is read by the second level 
cache controller to see whether the first level cache 
controller must also snoop the memory access. If the 
inclusion brUs_not set then the first level cache con- 
troller is left alone, tf the inclusion bit is set then the 40 
second level cache controfier directs the first level 
cache controller to snoop that particular memory 
access. In this manner, the first level cache controller 
can neglect its snooping duties until the second level 
cache controller determines that a write snoop hit on 43 
the first level cache has actually occurred. This aHows 
the processor to operate more efficiently out of its first 
level cache when it does not have control of the sys- 
tem bus. 

A better understanding of the invention can be 50 
obtained when the following detailed description of 
the preferred embodiment b considered In conjunc- 
tion with the following drawings, in which; 

Figure 1 is a block diagram of a computer system 
including first and second level caches and imple- 55 
menting multilevel inclusion according to the pre- 
sent invention; 

Figure 2 depicts the organization of the 2-way set 



associative C1 cache of Figure 1 ; 
Figure 3 depicts the organization of the 2-wa v set 
associative 02 cache of Figure 1 ; 
Figures 4A and 48 depict a flowchart illustrating 
the operation of cache read hits and misses 
according to the present invention; and 
Figure 5 la a flowchart illustrating the operation of 
nsad and write snooping according to the present 
invention- 
Referring now to Figure 1, a computer system 3 
13 generally shown. Many of the details of a computer 
system that are not relevant to the present invention 
have been omitted for the purpose of darity. The com- 
puter system S Includes a microprocessor 20 that is 
connected to a first level cache C1 that is preferably 
located on the same chip 22 as the processor 20. The 
chip 22 includes a C1 cache controller 30 that is co> 
nected to the C1 cache and controls the operation of 
the CI cache. The processor 20, the first level cache 
C1 f and the first level cache controller 30 are conneo- 
ted to a system bus 24 through a local processor bus 
25. Aeecond level cache C2 Is connected to the local 
processor bus 25. A second level cache controller, 
referred to ae the C2 cache controller 32. is connected 

to the C2 cache and the local proceesor bus 25. Ran- 
dom access memory 26, which is 4 Gigabytes in-size 
according to the present embodiment, and an intelli- 
gent bus master 28 are connected to the system bus 
24. The random access memory (RAM) 26, includes 
a system memory controller (not shown) that controls 
the operation of the RAM 26. The RAM 26 and the sys- 
tem memoiy controller (not shown) are hereinafter 
referred to as main memory 26. The system bus 24 
mdudee s data bus and a 32-bft address bus. the 
address bus including address bits A2 to A31, which 
allows access to any of 2» 32-bit dpuWewords in main 
memory 26. The bus master 28 may be any of the type 
that controls the system bus 24 when the processor 
system is on hold, such as the system direct memory 
access (DMA) controller, a hard disk interface, a local 
area network (IAN) interface or a video graphics pro- 
cessor system. 

The C1 and C2 caches are aligned on a "way" 
basis such that a copy of data placed in a particular 
way in one of the caches can only be placed in a pre- 
determined corresponding way in the other cache. 
This "way* alignment requires that the C2 cache have 
at least Ss many cache ways as does the C1 cache. 
If the CI and C2 caches have the same number of 
ways, then there Is a one-to-one correspondence be- 
tween the cache ways in the C1 cache and the cache 
ways in the C2 cache. If the C2 cache has more cache 
wave than the C1 cache, then each cache way in the 
C1 cache corresponds to one or more cache ways in 
the C2 cache. However, no two CI cache ways can 
correspond to the same C2 cache way. This require- 
ment stems from the fecf ^at eaSTmernoTy address 
has only one possible location in each of the C1 and * 
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C2 caches. Accordingly, if two C1 cache ways corres- 
ponded to a single 02 cache way, then ttiere would be 
memory address locations residing in the C1 cache 
that would be incapable of residing in the C2 cache. 
The respective 02 cache way location would be 
incapable of holding the two memory addresses 
which would reside in each of the respective C1 cache 
ways that corresponded to the respective 02 cache 
way location. 

The actual size of each of the caches la not 
important for the purposes of the invention- However, 
the 02 cache must be at least as large as the C1 
cache to achieve multilevel inclusion, and the C2 
cache is preferably at least four times as large as the 
01 cache to provide for an Improved cache hit rate. In 
the preferred embodiment of the present invention, 
the C1 cache is 8 kbytes in size and the C2 cache is 
preferably 512 kbytes in size. In this embodiment, the 
01 cache and the 02 cache are each 4-way set 
associative caches. In an alternate embodiment of the 
present invention, the 01 and 02 caches are each 2- 
way set-associative caches. 

Referring now to Rguree 2 and 3, conceptual 
diagrams of the 01 and 02 caches with their respec- 
tive cache controllers 30 and 32 configured in a 2-way 
set-associative organization are generaty shown. 
The following discussion is Intended to provide an 
introduction to the structure and operation of a set- 
associative cache as well as the relationship between 
the cache memory, cache directories, and main mem- 
ory 26. The C1 and 02 caches are discussed with 
reference to a 2-way set-associative cache organi- 
zation aa a simpler example of the more complex 4- 
way set-associative cache organization of the 
preferred embodiment The special cache controller 
design considerations that arise In a 4-way set- 
associative cache organization that do not occur in a 
2-way set-associative organization are noted In the 
following discussion. 

The 01 cache includes two banks or ways of 
memory, referred to as A1 and B1, which are each 4 
kbytes in size. Each of the cache ways A1 and B1 are 
organized into 128 sets, with- each set including eight 
lines 58 of memory storage. Eaach line includes one 
32-bit douWeword, or four. bytes of memory. Main 
memory 26 is conceptually organized as 2 20 pages 
with a page size of 4 kbytes, which is equivalent to the 
size of each C1 cache way A1 and Bi. Each concep- 
tual page in main memory 26 includes 1024 lines, 
which is the same number of lines as have each of the 
cache ways A1 and B1. The unit of transfer between 
the main memory 26 and the 01 cache is one line. 

A particular line. location, or page offset, from 
each of the pages in main memory 26, maps to the 
similarly located line in each of the cache ways A1 and 
31. For example, as shown in Figure 2 the page offset 
from each of the pages in main memory 26 that is 
shaded maps to the equivalents located, and shaded, 



line offset in each of the cache ways A1 and B1 . In this 
way, a particular page offset memory location from 
main memory 26 can only map to one of two locations 
in the C1 cache, these locations being in each of the 
5 cache ways A1 and B1. 

Each of the cache ways A1 and B1 include a 
cache directory, referred to aa directory DA1 and 
directory D81, respectively, that are located in the 01 
cache controller 30 of the 01 cache. The directories 
io DA1 and DB1 each Include one entry 60 and 62, re- 
spectively, for each of the 128 sets in the respective 
cache way A1 and B1. The cache directory entry for 
eech set has three components: a tag, a tag valid bit, 
and eight line valid bits, as shown. The number of line 
15 valid bits equate the number of lines in each set. The 
20 brts in the tag .field hold the upper address bits, 
address bits A1 2 to A31 , of the main memory address 
location of the copy of data that resides in the respec- 
tive set of the cache. The upper address bits address 
20 the appropriate 4 kbyte conceptual page in main 
memory 26 where the data in the respective set of the 
cache is located. The remaining address bits from this 
main memory address location, address bits A2 to 
A1 1, can be partitioned into a set address field com- 
28 prising seven bits, A5 to A1 1 . which are used to select 
one of the 128 sets In the 01 cache, and a line 
address field comprising 3 bite, A2 to A4, which are 
used to select an individual line from the eight lines in 
the selected set Therefore, the lower address bits A2 
30 through A11 serve as the "cache address" which 
dhnectiy selects one of the line locations in each of the 
ways A1 and B1 of the 01 cache. 

When the microprocessor initiates a memory 
read cycle, the address bits A5 to A1 1 are used to 
35 select one of the 128 seta, and the address bto A2 to 
A4 are used to select one of the respective line valid 
bits within each entry in the respective directories 
DA1 and DB1 from the selected set The lower 
address bits A2 to A11 are also used to select the 
40 appropriate line in the 01 cache. The cache controller 
compares the upper address bi tag field of the 
requested memory address with each of the tags 
stored in the selected directory entries of the selected 
set for each of the cache ways A1 and B 1 . At the same 
45 time, both the tag valid and fine valid bits are checked. 
If the upper address bits match one of the tags, and 
if both the tag valid bit and the appropriate line valid 
bits are set for the respective cache way directory 
where the tag match was made, the result Is a cache 
50 hit and the corresponding cache way is directed to 
drive the selected line of data onto the data bus. 
A miss can occur In either of two ways. The first 
Js known as a line miss and occurs when the upper 
address bits of the requested memory address match 
55 one of the tags in either of the directories DA1 or DB1 
of the selected set and the respective tag valid bit is* 
set but the respective line valid bit(s) where the 
requested data resides are dear. The second is called 
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a tag miss and occurs when either the upper address 
bits of the requested memory address do not match 
either of the respective tags in directories OA1 or DB1 
of the selected set where the requested data is 
located, or the respective tag valid bit for each of the s 
directories OA1 or DB1 are not dear. 

The C1 cache controller 30 includes a replace- 
ment algorithm that determines which cache way, A1 
or B1. in which to place new data. The replacement 
algorithm used is a least recently used (LRU) 10 
algorithm that places new data in the cache way that 
was least recently eccessed by the processor for 
data. This is because, statistically, the way most 
recently used is the way moat likely to be needed 
agar, in the near future. The C1 cache controller 30 ,* 
includes e directory 70 that holds a LRU bit for each 
set in the cache, and the LRU bit is pointed away from 
the cache way that was most recently accessed by the 
processor. Therefore, if data requested by the pro- 
cessor resides in way A1 , then the LRU bit is pointed to 
toward B1. If the data requested by ths processor 
resides In way 81, then the LRU bit ia pointed toward 
A1. 

In the 4-way set-associative C1 cache organ*. 
zation of »• Preferred embodiment a more elaborate is 
LRU or pseudo-LRU replacement algorithm can be 
used in the C1 cache controller 30. The choice of a 
replacement algorithm ia generally irrelevant to the 
present invention, and it is suggested that an LRU or 
pseudo-LRU algorithm be chosen to optimize the par. x 
ticular cache design used In the chosen embodimenL 
One replacement algorithm that can be used in the CI 
cache controller 30 in the 4-way set-associative C1 
cache organization of the preferred embodiment is a 
pseudo-LRU algorithm which operates as follows. 3S 
The 4-way set-associative C1 cache includes tour 
ways of memory referred to as WD. Wi. W2, and W3. 
Three bits, referred to as X0. XI. and X2. are located 
m the C1 cache controller 30 and are denned for a res- 
pective set in each of the ways in tin 4-way C1 cache. *o 
These bits are called LRU bits and are updated for 
every hit or replace in the C1 cache. If the most recent 
access in the respective set was to way WO or way 
W1. then X0 is set to 1 or a logic high value. Bit X0 is 
set to 0 or a logic low value if the most recent access as 
was to way W2 or way W3. If X0 is set to 1 and the 
most recent access between way W0 and way W1 
was to way W0. then X1 is set to 1. otherwise XI is 
set to 0. If X0 is set to 0 and the most recent access 
between way W2 and way W3 was to way W2, then so 
X2 is set to 1 . otherwise X2 is set to 0. 

The pseudo LRU replacement mechanism works 
in the following manner. When a line must be replaced 
in the 4-way C1 cache, the C1 cache controller 30 
uses the XO bit to first select the respective ways W0 a 
and WI or W2 and W3 where the particular line relo- 
cation candidate that was least recently used is 
located. The C1 cache controller then utilizes the X1 



andX2 bits to determine which of the two selected 
cache way, W0 and WI or W2 and W3 holds the res- 
pective line location that was least recently used, and 
line location is marked for replacement 
The C1 cache controller 30 broadcasts its LRU 
information to the C2 cache controller 32 on C1 and 

SJ!. ^T 1 miMa * and on Processor writes 
according to the present invention. In this manner, the 

J*J^" CCntroMer 32 is able to piece the copy of 
data that it receives from either the main memory 26 
on read misses or from the processor 20 on processor 
writes into the C2 cache way corresponding to the C1 
cacrie way where the C1 cache controller placed the 
C °™°* dat *' tn «eby achieving multilevel inclusion. In 
addition, the C1 cache controller 30 ignores its LRU 
replacement algorithm on a C1 cache read miss and 
a C2 cache read hit so that the C1 cache controller 30 
can place the copy of data that it receives from the C2 
cache «nwier 32 in the C1 cache way correspond- 
ing to the C2 cache way where the read hit occurred. 

The 2-way set-associative C2 cache is organized 
in a manner similar to that of the 2-way Mt-associatfve 
C1 cache. In the preferred embodiment, the C2 cache 
pnrferabiy comprise* 512 kbytes of cache data RAM. 
Referring now to Figure 3, each cache way A2 and B2 
in i the C2 cache is 256 kbytes in size and includes 
8192 sets of eight lines each. The line size in the C2 
cache is one 32-bit doubleword, which is the same as 
that of the C1 cache. The4 Gigabyte main memory 26 
is organized into 2" conceptual pages with each con- 
ceptual page being 256 kbytes in size. The number of 
conceptual pages of main memory 26 for the C2 
cache is less than that of the C1 cache because the 
conceptual page size tor the C2 cache is greater than 
that of the CI cache. As in the C1 cache, each fine 
location or page offset In main memory 26 maps to a 
stnitarly located line in each of the cache ways A2 and 
92. 

The C2 cache controller 32 includes cache way 
directories 0A2 and 082. The cache way directories 
DA2 and D82 have set entries which include 14-bit 
tag fields, aa opposed to the 20-bit tag fields in the 
entries of the C1 cache directories DA1 and DB1. The 
14-bit tag fields hold the upper address bits, address 
baa A18 to A31, that address the appropriate 256 
kbyte conceptual page in main memory 26 where the 
data in the respective set of the cache is located. The 
remaining address bits, A2 to A17, can be partitioned 
into a set address field comprising thirteen bits, A5 to 
A17, which are used to select one of the 8192 mete 
in the C2 cache, and a fine address field comprising 
3 bits, A2 to A4, which are used to select in individual 
line from the eight lines in the selected set Therefore, 
in the C2 cache the lower address bits A2 to A1 7 serve 
as the # cache address" which directly selects one of 
the Hne locations in each of the ways A2 and B2 of the 
C2 cache. 

The C2 cache controller 32 according to the pre- 
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sent invention does not generally require a replace- 
ment algorithm because the C2 cache receives new 
data onty on 01 and 02 cache read misses and on 
processor writes, and In those Instances the 02 cache 
controllerreceives the; way location from the 01 cache 
controller and must fill the corresponding 02 cache 
way. Therefore, the 02 cache controller 32 does not 
need a replacement algorithm because the respective 
02 cache way where data is placed is determined by 
the data's way location in the 01 cache. However. If 
the 02 cache has more ways than has the 01 cache, 
then the 02 cache controller 32 will require use of a 
replacement algorithm. In this instance, a C1 cache 
way will correspond to two or more 02 cache waye. 
Accordingly, when the 01 cache controller 30 broad- 
casts the 01 cache way location to the C2 cache con- 
troller 32, the 02 cache controller 32 wil need a 
replacement algorithm in order to decide between the 
multiple 02 cache ways that correspond to the 01 
cache way location in which to place the received 
data. 

The 2-way set-associative 01 and 02 caches are 
aligned on a Vay" basis such that the waye A1 and 
B1 in the 01 cache have a one-to-one correspond- 
ence with the waye A2 and BZ respectively, of the C2 
cache. In this manner, a page offset from main mem- 
ory 26 that is placed in the respective line location in 
a 01 cache way A1 or 81 has only one possible loca- 
tion in the corresponding 02 cache way A2 or B2, re- 
spectively. Conversely, a respective line location in a 
C2 cache way A2 or B2 has only one possible location 
in the corresponding 01 cache way A1 orB1, respect- 
ively. However, because the 02 cache is 64 times as 
large as the 01 cache, each of the 02 cache ways A2 
or 82 hold 64 lines of data that each correspond to, 
or could be located in, a single line or page offset loca- 
tion in the corresponding 01 cache way A1 or 31. 
Therefore, the C2 cache controller 32 according to the 
present invention includes inclusion bits 80 for each 
of its respective tfnesw This enables the 02 cache con- 
troller 32 to remember whether a copy of data from the 
respective 02 cache line also resides in the corre- 
sponding 01 cache line location. 

The use of Inclusion bits 30 allows the 02 cache 
controller 32 to remember which of the 64 lines of data 
in the respective 02 cache way A2 or 82 that corres- 
ponds to a single 01 cache way location holds a copy 
of data that is duplicated in that 01 cache location. For 
example, if a line in the 02 cache receives a copy of 
data from main memory 26 that was also placed in the 
01 cache, or if a line in the 02 cache provides a copy 
of data that is placed in the C1 cache, then an Inclu- 
sion bit for the respective 02 cache line Is true or set 
to a logic high value, signifying that the respective 02 
cache line holds a copy of data that is duplicated in the 
respective 01 cache location. The other 63 line loca- 
tions in the 02 cache which correspond to the respec- 
tive 01 cache location involved in the above operation 



have thefr inclusion bits deared as a reminder that the 
copy of data that they hold is not duplicated in a 01 
cache location. This ts important because one of 
these other 63 line locations may hold data that was 
5 previously duplicated In the respective 01 cache loca- 
tion before one of the operations mentioned above 
placed new data In the respective 01 cache location, 
and therefore one of these 63 locations may have its 
inclusion bit set The only instance where one of these 
10 other 63 02 cache locations would not have its inclu- 
sion bit set is when the respective 02 cache line loca- 
tion that was involved in the above operation and had 
its inclusion bit set also held the copy of data that was 
duplicated in the respective 01 cache location before 
18 _the operation took place and therefore already had its 
inclusion bit s«t 

Referring now to Figures 4A and 48. a flowchart 
describing the operation of the 01 and 02 caches 
according to the present invention is shown. It is 
20 understood that numerous of these operations may 
occur concurrently, but a flowchart format has been 
chosen to simplify the explanation of the operation. 
For darty, the flowchart is shown in two portions, with 
the interconnections between Figures 4A and 4B 
25 designated by reference to the circled numbers one 
and two. Step 100 represents that the computer sys- 
tem S Is operating or turned on. In some computer 
systems, the processor is required to have control of 
the system bue 24 before it may issue memory reads 
30 or writes. However, in the system S according to the 
preferred embodiment the processor 20 is not 
required to have control of the system bus 24 when it 
issues memory reads or wntos but rather the pro- 
cessor 20 can operate out of the 01 cache and the 02 
38 cache without requiring use of the system bus 24 untfl 
a C1 and 02 cache read miss or a processor write 
beyond any posting depth occurs. 

When the processor 20 attempts a main memory 
read in step 102, the 01 cache controller 30 first 
40 checks the 01 cache in step 104 to determine if a copy 
of the requested main memory data resides in the C1 
cache. If a copy of the requested data does not reside 
in the 01 cache, then a C1 cache read miss occurs in 
step 106, and the read operation is passed on to the 
48 02 cache, where the 02 cache controller 32 then 
checks the 02 cache in step 10& If a copy of the 
requested data does not reside in the C2 cache, then 
a C2 cache read miss occurs in step 110, and the 
operation Is passed onto the system memory controi- 
50 ier to obtain the necessary data from main memory 
26. 

Main memory 26 provides the requested data to 
the 01 cache, the 02 cache and the processor 20 in 
step 1 12, and the 01 cache controller 30 places the 
38 data into one of its cache ways A1 or B1 according to 
its particular replacement algorithm in step 114. The 
data is placed in the CI cache because of the statis- 
tical likelihood that this data will be requested again 
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soon by the processor 20. The C1 cache controller 30 
during this period has been broadcasting to the C2 
cache controller 32 the particular C1 cache way A1 or 
31 In which It Is placing the data, represented In step 
11 $ f so that the C2 cache controller 32 can place the 5 
data in the corresponding C2 cache way A2 or B2 In 
step 120. The C2 cache controller 32 sets the inclu- 
sion bit on the respective C2 cache memory location 
where the data is stored In step 122, signifying that a 
copy of the data in this location also resides in the C1 10 
cache. The C2 cache controller 32 also dears the 
inclusion bits on the ether 63 C2 cache locations that 
correspond to the same page offset location in the C1 
cache in step 124 to signify that a copy of the data in 
these locations does not reside in the C1 cache. Upon 19 
completion of the memory read, the computer system 
returns to step 100. 

The above sequence of events occurs on a C1 
and C2 cache read miss and also when the computer 
system S is first turned on because the C1 and C2 20 
caches are both empty at power on of the computer 
system S and C1 and C2 cache misses are therefore 
guaranteed. The majority of processor memory reads 
that occur immediately after power on of the computer 
system S wil be C1 and C2 cache misses becauee 28 
the C1 and C2 caches are relatively empty at thee time. 
In this manner, the C1 and C2 caches are fined wfth 
data and align themselves on a •way - basis wherein 
data in a particular way A1 or B1 in the C1 cache is 
guaranteed to be located In the corresponding cache so 
way A2 or 82 in the C2 cache. In addition, when the 
computer system S has been operating for a while and 
a C1 and C2 cache read miss occurs, the resulting line 
fillsof ^ta in the CI and C2 caches are performed as 
described above and therefore the "way* aUgrment is 35 
maintained. 

When the processor 20 initiates a main memory 
read in step 102 and the C2 cache controller 32 
checks the C2 cache in step 108 after a C1 cache 
miss occurs in step 1 08, and a copy of the requested m 
data resides in the C2 cache, then a C2 cache hit 
occurs in step 130. The C2 cache controller 32 pro- 
vides the requested data to the processor 20 In step 
132, and also provides the data to the C1 cache in 
step 1 34 due to the statistical likelihood that this data 49 
will be requested again soon by the processor 20. The 
C2 cache controller 32 informs the C1 cache control- 
ler 30 as to the particular C2 cache way A2 or B2 in 
which the data is located in the C2 cache in step 138 
so that the C1 cache controller 30 can place the data so 
in the corresponding C1 cache way A1 or B1 in step 
1 3a This requires that the C1 cache controller 30 dis- 
regard its normal LRU replaced algorithm because 
the replacement algorithm may choose a different C1 
cache way A1 or B1 in which to place the data. In this 55 
manner, the C1 and C2 caches maintain their "way - 
alignment without a requirements the C2 cache con- 
troller 32 to transfer data between the ways in the C2 
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cache. The C2 cache controller 32 sets the inclusion 
bit on the C2 cache location where the requested data 
is located in step 140, signifying that a copy of this 
data also resides in the C1 cache. The C2 cache con- 
troller 32 also dears the other 63 inclusion bits on the 
C2 cache memory locations that correspond to the 
same page offset location to signify that a copy of the 
data in these locations does not reside In the C1 
cache. The computer system S Is then finished with 
the memory read and returns to step 100. 

When the processor 20 initiates a memory read 
m step 1 02 and checks the contents of the C1 cache 
in step 104 to determine if a copy of the requested 
data resides there, and a copy of the requested data 
does reside in the C1 cache, then a C1 cache hit takes 
place in step 150. The C1 cache controller 30 pro- 
vides the requested data to the processor 20 in step 
15Z and operation of the computer system S is 
resumed in step 1 00. Since multilevel inclusion exists 
in the cache subsystem, the C2 cache is guaranteed 
to have e copy of the date that the C1 cache controller 
30 provided to the processor 20, and no transfer of 
date from the C1 cache controller 30 to the C2 cache 
controller 32 is necessary when a C1 cache read hi 
takes place. 

The cache architecture of the C1 cache in the pre- 
ferred embodiment is preferably a write-through 
cache architecture and the cache architecture of the 
C2 cache is preferably a write-back cache architec- 
ture. However, the use of other cache architecturee 
for the C1 cache and the C2 cache is also contem- 
plated. When the processor 20 performs a memory 
write operation, the data is written into the C1 cache, 
regardless of whether the processor write is a C1 
carte write hit or write miss. In addition, processor 
writes initiate external write bus cycles to write the 
respective date into the C2 cache. When this occurs, 
the C1 cache controller 30 broadcasts the particular 
C1 cache way where the data was placed so that the 
C2 cache controller 32 can place the data in the cor- 
responding C2 cache way. Therefore, the C1 and C2 
caches allocate write misses according to the present 
invention. It is preferred that the C1 and C2 either both 
allocate write misses or both do not allocate write mis- 
see. If the C1 cache were to not allocate writes and 
the C2 cache were to allocate writes, the designs 
waid be more complicated. The C2 cache controller 
32 would require an LRU algorithm and would need 
to insure that if the C2 cache controller LRU algorithm 
selected a particular C2 cache way that contains a 
copy of data that is duplicated in the C1 cache, the 
LRU algorithm would be overridden or the caching 
aborted so that multfevel Indusion remained guaran- 
teed. 

Referring now to Rgure 5. when the intelligent 
bus master 28 gains control ;of the system bus 24 in 
step 200, the C2 cache controller .32 watches or 
•snoops" the system bus 24 in step 202 to see if the 
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bus master 28 performs any writes, and reads in the 
case of a write-back cache, to main memory 26, and, 
if so, which memory location is being accessed. The 
C2 cache controller 32 can perform the snooping res- 
ponsibilities for both the C1 and C2 caches because 5 
the C2 cache is guaranteed to have a copy of all the 
data that resides in the C1 cache due to the multilevel 
inclusion. 

If the bus master 28 writes to main memory 26 in 
step 204 and a write snoop hit occurs in the C2 cache 1 o 
in step 206, then the C2 cache controller 32 checks 
the inclusion bit for the respective C2 cache location 
to see whether the C1 cache controller 30 must arfeo 
snoop the memory access in step 208. If the inclusion 
bit Is not set in step 206, then a copy of the data from 1 5 
the memory location being written to does not reside 
in the C1 cache, and the C1 cache controller 30 ie left 
alone, (n this case, the C2 cache receives the new 
copy of data 11 step 210 and the C2 cache controller 
32 resumes its snooping duties in step 202. If the 20 
inclusion bit on the C2 cache memory location is set 
in step 208 after a snoop hit in step 206, then the C2 
cache controller directs the C1 cache controller 30 to 
snoop that particular memory access in step 212. In 
step 214, the C1 and C2 caches each receive a copy 28 
of the new data, and the C2 cache controller 32 
resumes its snooping duties in step 202. If a snoop 
miss occurs in step 206 after the bus master 28 writes 
to a memory location in step 204, then the C2 cache 
controller 32 resumes its snooping duties iii step 202. 30 
The C2 cache controller 32 continues to snoop the 
system bus 24 in step 202 until the bus master 28 is 
no longer in control of the system bus 24. 

If the bus master 28 reads a main memory loca- 
tion in step 204 and a read snoop hit occurs in the C2 35 
cache in step 220, then the C2 cache controller 32 
checks the C2 cache location In step 222 to determine 
if it is the owner of the respective memory location. If 
not, then main memory 26 or other soiree services 
the data request, and the C2 cache controller 32 40 
resumes snooping in step 202. If the C2 cache con- 
troller 32 is the owner of the memory location, then the 
C2 cache controller 32 provides the requested data to 
main memory 26 in step 224. The bus master 28 reads 
this data in step 226 when the data has been placed 45 
on the data bus, this being referred to as snarfing. The 
C2 cache controller 32 then resumes its snooping 
duties in step 202. If a snoop miss occurs in step 220 
after the bus master 28 reads a memory location in 
step 204, then the C2 cache controller 32 resumes its so 
snooping duties in step 202. 

In this manner, the C1 cache controller 30 can 
neglect rts snooping duties until the C2 cache control- 
ler 32 determines that a snoop hit on data held in the 
C1 cache has actually occurred. This allows the pro- 55 
cessor 20 to operate more efficiently out of the CI 
cache while it does not have control of the system bus 
24 because the C1 cache controller 30 only has to 
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snoop the system bus 24 when a CI cache snoop hit 
occurs, not on every memory write as it normally 
would. 

The foregoing disclosure and description of the 
invention are illustrative and explanatory thereof, and 
various changes in the size, components, construe 
tion and method of operation may be made without 
departing from the spirit of the invention. 

Claims 

1. A method for achieving multilevel inclusion in a 
computer system having a microprocessor, a sys- 
tem bus, a first level set associative cache mem- 
ory including a first number of ways, a first level 
cache controller, a second level set associative 
cache including a number of ways equal to or gre- 
ater than the first number of ways of the first level 
cache, wherein each of the ways in the fret level 
cache corresponds to at least one way in the sec- 
ond level cache, a second levet cache controller, 
meana coupled to the second level cache control- 
ler for setting and clearing an inclusion bft on data 
inside the second level cache, means coupled to 
the first and second level cache controllers for 
communicating and transmitting data between 
the first level and second level caches, a bus 
master device, and random access memory, the 
method comprising: 

the first level cache controller communi- 
cating to the second level cache controller the 
particular first level cache way in which a copy of 
data received from the random access memory is 
placed on a first level and second level cache 
read miss; 

the second level cache controller placing 
the copy of data received from the random access 
memory In the second level cache way corre- 
sponding to the first level cache way corrvnunh 
cated by the first level cache controller on the first 
level and second level cache read miss; 

the second levei cache controller com- 
municating to the first level cache controller the 
particular second level cache way where a copy 
of data is located on a first level cache read miss 
and second level cache read hit; 

the first level cache controller placing the 
copy of data transmitted from the second level 
cache controller to the processor in the corre- 
sponding first level cache way; and 

the second level cache controller setting 
an inclusion bit on the second level cache loca- 
tion of the copy of data arid clearing inclusion bits 
on any other second level cache locations that 
correspond to the first levei cache location where 
the first level cache controller placed the copy of 
data. 
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2. The method of daim 1, wherein the first level 
cache controller includes a replacement 
algorithm mat determines which first level cache 
way in which to place a received copy of data, the 
step of the first level cache controller copying the 3 
data Into the first level cache way corresponding 

to the second level cache way including: 

the first level cache controller disregarding 
its replacement algorithm on first level cache read 
miss and second level cache read hit cases. 10 

3. The method of claim 1 , further comprising: 

the first level cache controller communi- 
eating to the second level cache controller the 
particular fiist level cache way in which a copy of rs 
received data is placed on a processor write; and 

the second level cache controller placing 
the copy of received data in the second level 
cache way corresponding to the first level cache 
way communicated by the first level cache con- 20 
trailer; 

4. The method of claim 1, wherein greater than one 
way in the first level cache cannot correspond to 
than one cache way In the second level cache can 25 
correspond to one way in the first level cache. 

5. The method of claim 1 , further comprising: 

the second level cache controller snooping 
the system bus when the processor does not 30 
have control of the system bus to determine if the 
bus master device is writing to a cached memory 
location; 

the second level cache controller checking 
the inclusion bit on a second level cache location 35 
where a second level cache write snoop hrt 
occurs to determine if a copy of data from the ran- 
dom access memory location being written to 
resides in the first level cache; md 

the second level cache controller directing 40 
the first level cache controllerto snoop the system 
bus if said inclusion bit is set 

6. The method of daim 5, wherein the second level 
cache is a write-back cache, the method further 46 

comprising: 

the second level cache controller snooping 
the system bus when the processor does not 
have control of the system bus to determine if the 
bus master device is reading a cached memory 50 
location; 

the second level cache controller deter- 
mining if the second level cache has an updated 
version of the data residing in the requested 
memory location on a second level cache read 55 
snoop hit; 

the second level cache controller providing 
the requested data to main memory if the second 
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level cache has an updated version of the data; 
and 

the bus controller reading the requested 
data provided by the second level cache control- 
ler: 

7. An apparatus for achieving multilevel inclusion in 
a computer system, comprising: 
a system bus; 

a microprocessor coupled to said system 

bus; 

a first level cache memory coupled to said 
microprocessor and including a first number of 
ways; 

a first level cache controller coupled to 
said first level cache, said microprocessor and 
said system bus and Including an output for trans- 
mitting way information and an input for receiving 
way information; 

a second level cache of a size greater than 
or equal to the size of the first level cache which 
indudee a number of ways equaJ to or greater 
than the first number of ways of the first ftevel 
cache, wherein each of the ways in the first level 
cache corresponds to at least one way in the sec- 
ond level cache and which includes inclusion 
information indicating presence of data in the sec- 
ond level cache that is duplicated in the first level 
cache; 

a second level cache controller coupled to 
said system bus, said second level cache, said 
microprocessor, and said first level cache control- 
ler and including an input coupled to said first 
level cache controller way information output for 
receiving way information and an output coupled 
to said first level cache controller way information 
input for transmitting way information; and 

random access memory coupled to said 
system bus; 

wherein on a first and second level cache 
read miss said first level cache controller trans- 
mits way information to said second level cache 
controller and said second level cache controller 
places received data in a way of the second level 
cache corresponding to the received way infor- 
mation, 

wherein on a first level cache read miss 
and a second level cache read hit said second 
level cache controller transmits way information 
to said first level cache controller and said first 
level cache controller places received data in a 
way of the first level cache corresponding to the 
fc received way Information, arid 

wherein sad second level cache controller 
sets the inclusion bit in the second level cache 
location which contains the data placed in the first 
level cache and dears the inclusion bits of any 
other second level cache locations which cone- 
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spond to the first level cache location where the 
data was placed. 

8. The apparatus of daim 7, wherein said first level 
cache controller includes a replacement means $ 
that determines which first level cache way in 
which to place a received copy of data, wherein 
said first level cache controller disregards said 
replacement means on fret level cache read miss 

and second level cache read hits cases. 10 

9. The apparatus of claim 7, wherein greater than 
one way in the first level cache cannot correspond 
to one cache way in the second level cache and 
greater than one way in the second level cache is 
can correspond to one way in the first level cache. 

10. The apparatus of daim 7, wherein on a processor 
write said firat level cache controller transmfts 
way information to said second level cache con- 20 
trotier and said second level cache controller 
places received data in a way of the second level 
cache corresponding to the received way Infor- 
mation. 

28 

1 1. The apparatus of daim 7, further comprising: 

a bus master device coupled to said sys- 
tem bus; and 

wherein said first level cache controller 
includes means for snooping the system bus 30 
when said microprocessor does not have control 
of said system bus to determine rf the bus master 
device is writing to a random access memory 
location that is cached in the first level cache, and 

wherein said second levet cache controller 38 
further includes: 

means for snooping the system bus when 
said microprocessor does not have control of said 
system bus to determine if the bus master device 
is writing to a random access memory location 40 
that is cached in the second level cache; 

means for checking the inclusion bit on a 
second level cache location where a second level 
cache write snoop hit occurs to determine if a 
copy of data from said random access memory 43 
location being written to also resides in said first 
level cache; and 

means coupled to said first level cache 
controller which directs said first level cache con- 
troller to snoop the system bus if said Inclusion bit so 
is set 

12. The apparatus of claim 11, farther comprising: 

said second level cache being a write-back 
cache, wherein said second level cache control- 55 
ler farther includes: 

means for snooping the system bus when 
said microprocessor does not have control of said 



system bus to determine if the bus master device 
is reading a random access memory location that 
is cached in the second level cache; 

means for determining whetherthe second 
ievel cache includes an updated version of the 
data residing in the requested memory location 
when a second level cache read snoop hit occurs; 
and 

means for providing the requested data to 
main memory rf the second level cache has an 
updated version of the data, wherein the bus con- 
troller reads the requested data provided by the 
second level cache controller. 
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blocks of data they are replacing and which or 
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first and second level cache read misses the 
first level cache controller provides way infor- 
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allow received data to be placed in the same 
way. On first level cache read misses and sec- 
ond level cache read hits, the second level 
cache controller provides way information the 
first level cache controller, which Ignores its 
replacement indication and places data in the 
indicated way. On processor writes the first 
level cache controller cachee the writes and 
provides the way information to the second 
level cache controller which also caches the 
writes and usee the way information to select 
the proper way for data storage. An inclusion bit 
is set on data in the second level cache that is 
duplicated in the first level cache. Multilevel 
inclusion allows the second level cache control- 
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first level cache controller to avoid snooping 
duties until a first level cache snoop hit occurs. 
On a second level , cache snoop hit the second 
level cache controller checks the respective 
inclusion bit to determine if a copy of this data 
also resides in the first level cache. The first 
level cache controller is directed to snoop the 
bus only if the respective inclusion bit is set 
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